1 research outputs found
Tests4Py: A Benchmark for System Testing
Benchmarks are among the main drivers of progress in software engineering
research, especially in software testing and debugging. However, current
benchmarks in this field could be better suited for specific research tasks, as
they rely on weak system oracles like crash detection, come with few unit tests
only, need more elaborative research, or cannot verify the outcome of system
tests.
Our Tests4Py benchmark addresses these issues. It is derived from the popular
BugsInPy benchmark, including 30 bugs from 5 real-world Python applications.
Each subject in Tests4Py comes with an oracle to verify the functional
correctness of system inputs. Besides, it enables the generation of system
tests and unit tests, allowing for qualitative studies by investigating
essential aspects of test sets and extensive evaluations. These opportunities
make Tests4Py a next-generation benchmark for research in test generation,
debugging, and automatic program repair.Comment: 5 pages, 4 figure